49 research outputs found

    Return of Frustratingly Easy Domain Adaptation

    Full text link
    Unlike human learning, machine learning often fails to handle changes between training (source) and test (target) input distributions. Such domain shifts, common in practical scenarios, severely damage the performance of conventional machine learning methods. Supervised domain adaptation methods have been proposed for the case when the target data have labels, including some that perform very well despite being "frustratingly easy" to implement. However, in practice, the target domain is often unlabeled, requiring unsupervised adaptation. We propose a simple, effective, and efficient method for unsupervised domain adaptation called CORrelation ALignment (CORAL). CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. Even though it is extraordinarily simple--it can be implemented in four lines of Matlab code--CORAL performs remarkably well in extensive evaluations on standard benchmark datasets.Comment: Fixed typos. Full paper to appear in AAAI-16. Extended Abstract of the full paper to appear in TASK-CV 2015 worksho

    Influence of Polygonal Wear on Dynamic Performance of Wheels on High-Speed Trains

    Get PDF
    With increases in train speed and traffic density, polygonal wear of railway wheels arises accordingly, induced by the high impacts between wheels and rails, which is mainly related to operation safety and ride comfort of vehicle system. This work evaluates the effect of wheel polygon shape on the dynamic performance of the wheel set through numerical simulations. The finite element model, which includes the wheel set and the slab track, was established using ANSYS software to study the effects of polygonal wear on the dynamic behavior of the railway wheel. In the model, wheel–rail interaction forces caused by polygon wheel shape were solved using Universal Mechanisms of wear and were then entered into the finite element model. Using the simulation model, the influence of the harmonic order and out-of-roundness amplitude of wheel polygon on transient dynamic behaviors of the wheels namely, the displacement, acceleration, and von Misses equivalent stress were investigated. The results indicate that both the maximum dynamic displacement and Von Misses equivalent stress of the wheel plate show proportionality to the OOR amplitude, the harmonic order and the vehicle velocity. Besides, the maximum Von Misses equivalent stress occurs close to the wheel center, whereas the maximum displacement occurs close to the wheel tread. The findings will provide a theoretical basis for on-board detection methods of monitoring wheel polygonal wear

    Learning Deep Object Detectors from 3D Models

    Full text link
    Crowdsourced 3D CAD models are becoming easily accessible online, and can potentially generate an infinite number of training images for almost any object category.We show that augmenting the training data of contemporary Deep Convolutional Neural Net (DCNN) models with such synthetic data can be effective, especially when real training data is limited or not well matched to the target domain. Most freely available CAD models capture 3D shape but are often missing other low level cues, such as realistic object texture, pose, or background. In a detailed analysis, we use synthetic CAD-rendered images to probe the ability of DCNN to learn without these cues, with surprising findings. In particular, we show that when the DCNN is fine-tuned on the target detection task, it exhibits a large degree of invariance to missing low-level cues, but, when pretrained on generic ImageNet classification, it learns better when the low-level cues are simulated. We show that our synthetic DCNN training approach significantly outperforms previous methods on the PASCAL VOC2007 dataset when learning in the few-shot scenario and improves performance in a domain shift scenario on the Office benchmark

    LOWA: Localize Objects in the Wild with Attributes

    Full text link
    We present LOWA, a novel method for localizing objects with attributes effectively in the wild. It aims to address the insufficiency of current open-vocabulary object detectors, which are limited by the lack of instance-level attribute classification and rare class names. To train LOWA, we propose a hybrid vision-language training strategy to learn object detection and recognition with class names as well as attribute information. With LOWA, users can not only detect objects with class names, but also able to localize objects by attributes. LOWA is built on top of a two-tower vision-language architecture and consists of a standard vision transformer as the image encoder and a similar transformer as the text encoder. To learn the alignment between visual and text inputs at the instance level, we train LOWA with three training steps: object-level training, attribute-aware learning, and free-text joint training of objects and attributes. This hybrid training strategy first ensures correct object detection, then incorporates instance-level attribute information, and finally balances the object class and attribute sensitivity. We evaluate our model performance of attribute classification and attribute localization on the Open-Vocabulary Attribute Detection (OVAD) benchmark and the Visual Attributes in the Wild (VAW) dataset, and experiments indicate strong zero-shot performance. Ablation studies additionally demonstrate the effectiveness of each training step of our approach

    Railway Polygonized Wheel Detection Based on Numerical Time-Frequency Analysis of Axle-Box Acceleration

    Get PDF
    The increasing need for repairs of polygonized wheels on high-speed railways in China is becoming problematic. At high speeds, polygonized wheels cause abnormal vibrations at the wheel-rail interface that can be detected via axle-box accelerations. To investigate the quantitative relationship between axle-box acceleration and wheel polygonization in both the time and frequency domains and under high-speed conditions, a dynamics model was developed to simulate the vehicle-track coupling system and that considers both wheel and track flexibility. The calculated axle-box accelerations were analyzed by using the improved ensemble empirical mode decomposition and Wigner-Ville distribution time-frequency method. The numerical results show that the maximum axle-box accelerations and their frequencies are quantitatively related to the harmonic order and out-of-roundness amplitude of polygonized wheels. In addition, measuring the axle-box acceleration enables both the detection of wheel polygonization and the identification of the degree of damage. Document type: Articl

    Evaluation and Mitigation of Agnosia in Multimodal Large Language Models

    Full text link
    While Multimodal Large Language Models (MLLMs) are widely used for a variety of vision-language tasks, one observation is that they sometimes misinterpret visual inputs or fail to follow textual instructions even in straightforward cases, leading to irrelevant responses, mistakes, and ungrounded claims. This observation is analogous to a phenomenon in neuropsychology known as Agnosia, an inability to correctly process sensory modalities and recognize things (e.g., objects, colors, relations). In our study, we adapt this similar concept to define "agnosia in MLLMs", and our goal is to comprehensively evaluate and mitigate such agnosia in MLLMs. Inspired by the diagnosis and treatment process in neuropsychology, we propose a novel framework EMMA (Evaluation and Mitigation of Multimodal Agnosia). In EMMA, we develop an evaluation module that automatically creates fine-grained and diverse visual question answering examples to assess the extent of agnosia in MLLMs comprehensively. We also develop a mitigation module to reduce agnosia in MLLMs through multimodal instruction tuning on fine-grained conversations. To verify the effectiveness of our framework, we evaluate and analyze agnosia in seven state-of-the-art MLLMs using 9K test samples. The results reveal that most of them exhibit agnosia across various aspects and degrees. We further develop a fine-grained instruction set and tune MLLMs to mitigate agnosia, which led to notable improvement in accuracy

    Phylophenetic properties of metabolic pathway topologies as revealed by global analysis

    Get PDF
    BACKGROUND: As phenotypic features derived from heritable characters, the topologies of metabolic pathways contain both phylogenetic and phenetic components. In the post-genomic era, it is possible to measure the "phylophenetic" contents of different pathways topologies from a global perspective. RESULTS: We reconstructed phylophenetic trees for all available metabolic pathways based on topological similarities, and compared them to the corresponding 16S rRNA-based trees. Similarity values for each pair of trees ranged from 0.044 to 0.297. Using the quartet method, single pathways trees were merged into a comprehensive tree containing information from a large part of the entire metabolic networks. This tree showed considerably higher similarity (0.386) to the corresponding 16S rRNA-based tree than any tree based on a single pathway, but was, on the other hand, sufficiently distinct to preserve unique phylogenetic information not reflected by the 16S rRNA tree. CONCLUSION: We observed that the topology of different metabolic pathways provided different phylogenetic and phenetic information, depicting the compromise between phylogenetic information and varying evolutionary pressures forming metabolic pathway topologies in different organisms. The phylogenetic information content of the comprehensive tree is substantially higher than that of any tree based on a single pathway, which also gave clues to constraints working on the topology of the global metabolic networks, information that is only partly reflected by the topologies of individual metabolic pathways

    Joint Adversarial Domain Adaptation

    Get PDF
    Domain adaptation aims to transfer the enriched label knowledge from large amounts of source data to unlabeled target data. It has raised significant interest in multimedia analysis. Existing researches mainly focus on learning domain-wise transferable representations via statistical moment matching or adversarial adaptation techniques, while ignoring the class-wise mismatch across domains, resulting in inaccurate distribution alignment. To address this issue, we propose a Joint Adversarial Domain Adaptation (JADA) approach to simultaneously align domain-wise and class-wise distributions across source and target in a unified adversarial learning process. Specifically, JADA attempts to solve two complementary minimax problems jointly. The feature generator aims to not only fool the well-trained domain discriminator to learn domain-invariant features, but also minimize the disagreement between two distinct task-specific classifiers' predictions to synthesize target features near the support of source class-wisely. As a result, the learned transferable features will be equipped with more discriminative structures, and effectively avoid mode collapse. Additionally, JADA enables an efficient end-to-end training manner via a simple back-propagation scheme. Extensive experiments on several real-world cross-domain benchmarks, including VisDA-2017, ImageCLEF, Office-31 and digits, verify that JADA can gain remarkable improvements over other state-of-the-art deep domain adaptation approaches
    corecore